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INTRODUCTION 


A general constrained optimization problem can be given as 
min f (x) 
xeQ 


Q={xeR" 


c,(x) =0,i € €,c,(x) 20,1 el} 


Newton’s method, which is much less accepted by ill-conditioning provided the linear system at each of 
iteration is solved directly, is accepted by having its domain of attraction shrink, hence the importance of the 


continuation technique. 
1 Penalty Method 


In this section we consider the problem 


min f (x) (1a) 
s.t. c,(x) =0,7 =1, 2,...,m (1b) 
Define 

A’ =[Ve,(x),...Ve,,(0)| (1c) 


and assume that A has full row rank m<n. 


Recall that the Lagrangian is defined by 


Lex) = FO DAE) tr) 
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and that the KKT conditions require, in addition to (10.1b), that at (x*, A*), 
L, =Vf(x)- 1 AVe,(x) =0. 
i=l 
In the quadratic penalty method we solve a sequence of unconstrained minimization problems of the form 


min (x; 4) = f(x)+ a dei (x) (2) 


for a sequence of values {= Ll, 1 0.We can use, for instance, the solution x*(4u,_,) of the (k -1)st 


unconstrained problem as an initial guess for the unconstrained problem (2) with 44 = £4,. This is a simple continuation 


technique. 


It is hard to imagine anything simpler to intuit. Unfortunately, however, the problem (2) becomes ill-conditioned 
as wt gets small. Both BFGS and CG methods become severely accepted by this. An algorithmic framework for such a 
method reads: Given p10 > 0, and for k = 0,1, 2,... 


Starting with x, solve (10.2) for 44 = 4, , terminating when 
|V den 4,)|| <7, 

Where 7, 10. Call the result Xi; . 

If final convergence test holds (e.g. T, < tol ) exit 

Else 

Choose 44,,, € (0, fu, ) 


. . ok 
Choose a new starting point X,,,,€.8. %,4; =%X,. 


End 


The choice of how to decrease p can depend on how difficult it has been to solve the previous subproblem, e.g., 


0.7u, if (x3u,) was hard 
Olu, if (x; My, ) was easy 


When comparing the gradients of the unconstrained objective function @(x, //) of (2) and the Lagrangian 


L(x, 2) of (1d) it appears that — fi 
Ll 


Has replaced A, . Indeed, it can be shown that if 7, 0 then x, Sx 
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and 


JG ge i=1,2,...m. . 


HM, 
Example | 
Let us consider the minimization of the objective functions under the constraint 
e=1; 
For each value of penalty parameter encountered, we define the function @ of (2) with the line search option 


and tolerance T, = tol =1.e —6, to solve the sub problem of minimizing @. If more than 9 iterations are needed then the 


update is 44<— 0.744, otherwise itis 6 <—O.1u. 


For the quadratic 4-variable objective function we start with the unconstrained minimizer, 


x, =(1,0,-1, = and obtain convergence after a total of 44 damped Newton iterations to 


x** x (-0.02477, 0.31073, 0.78876, 0.52980)’. 


The objective value increases from the unconstrained (and infeasible) f (X,) ¥ —167.28 to f (x*) = —133.56. 


The resulting penalty parameter sequence was 
u = 1,.1,.01,...,1.e-8, 


i.e., all sub problems encountered were deemed “easy”, even though for one sub problem a damped Newton step 


(i.e. step size < 1) was needed. The final approximation for the Lagrange multiplier is 
(1—x"x)/10° ~ 40.94. 


For the non-quadratic objective function, 
f(x) =[[.5—x,d-x,)] +[2.25-»0- my: +| 2.625-x,(1- i) 


V2 


We start with x) = 7 (1, 1)’ , which satisfies the constraint. Convergence to x. x* = (0.99700, —0.07744)" 


is reached after a total of 39 iterations. The objective value is f (x*) * 4.42, up from 0 for the unconstrained minimum. 


The penalty parameter sequence was 
u = 1,.1,.01,.001,7.e-4,7.e-5,...,7.e-8, 


But the path to convergence was more tortuous than these numbers indicate, as solution of the 3rd and 4th sub 


problems failed. The final approximation for the Lagrange multiplier is 


(1—x"x)/(7x10°%) x -3.35. 
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To understand the nature of the ill-conditioning better, note that the Hessian of Q of (2) is 


2, 2 1 T 1 ~ 7) 
V'b(x 4, =V LOD (x)A(x) +— D6, ()V"e,(x) 


k Ly, i=l 


2V°L+ eR 
Ly, 


1 4 
The matrix — A’ A has n-m zero eigen values as well as m eigen values with sizeO(“, ). So, we have an 


My 


unholy mixture of very large and zero eigenvalues. This could give trouble even for Newton’s method. 


Fortunately, for Newton’s iteration, to find the next direction p we can write the linear system in augmented form 
(verify!), 
C(X) 52 T P 
5 m ——Ve(x) A’ (x) —V A(X Ly, ) 
VIOt DE He = ; 
0 
A(x) wl JE 


This matrix tends towards the KKT matrix and all is well in the limit. Finally, we note for later purposes that 


instead of the quadratic penalty function (2) the function 


m 


Acs = f+ 


i=l 


C; (x)| (4) 


Could be considered. This is an exact penalty function: for su. ciently small p > 0 one minimization yields the 


optimal solution. 


Unfortunately, that one unconstrained minimization problem turns out in general practice to be harder to solve 
than applying the continuation method presented before with the quadratic penalty function (2). But the function (4) also 
has other uses, namely, as a merit function. Thus, (10.4) can be used to assess the quality of iterates obtained by some other 


method for constrained optimization. 
2 Barrier Method 


We now consider only inequality constraints, 
min f (x) (5a) 
s.t. C(x) 2 0,7 =1,2,...,m ci(x) (5b) 


We may or may not have m > n, but we use the same notation A(x) as in (10.1c), dropping the requirement of a 


full row rank. In the log barrier method we solve a sequence of unconstrained minimization problems of the form 


min W(x; 4) = f(x) — p> log C(x) (6) 


i=l 
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for a sequence of values L = Ll, +0.. 


Starting with a feasible X, in the interior of (2 we always stay strictly in the interior of ., so this is an interior 


point method. This feasibility is a valuable property if we stop before reaching optimum. 


@ there. 


Example 2 


Here is a very simple problem: 


minx, x>0. 
x 


By (6), 
W(x; 4) = x—plog x. 


d 
Setting / = LU, we consider 0 = a =1—y/ x, from which we get x, = 4, >O0=x*. 
xX 


Clearly, as {, 10 we expect numerical trouble, just like for the penalty 


Method in Section 10.1. The algorithmic framework is also the same as for the penalty method, with y replacing 


Note that 


m 


Viv (xu) = Vf) >) ae 


Vc, (x). 


Comparing this to the gradient of the Lagrangian (10.1d) we expect that, as Jd, 1 0, 


HH, 


oA, i=1,2,....,m. 
C;(%) 


L 


For the strictly inactive constraints (c, > 0) we get A, = 0 in (7), as we should. 


It is easy to see that, sufficiently close to the optimal solution, 


m * 2 
Vw(x3 ,.) ® VL(x A") + 5) ye, Ve,(x)" 
ist My 
*\2 
=WL(x,A°)+ > GY V6. xVe,(x)". (7) 


icA(x’) Lex 
This expresses ill-conditioning exactly as in the penalty case (unless there are no active constraints at all). 


Let us denote by x(u) the minimizer of w(x, /), and let (for p > 0) 
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(1) mee ae Se (8) 


Then V L(x(i2), x(t) = Vy =0. Also, c(x) >0,2>0. > 0. So, all KKT conditions hold except for 


complementarity. In place of complementarity we have 
c(x(L))A, (4) = > 0, i=1,2,....,m. 
We are therefore on the center path defined artifcially by (8), 
Cra = {C0 AC) 5(H) [ee > OF, 9) 


Where s = c(x) are the slack variables. For the primal-dual formulation we can therefore write the above as 


Vf (x)— A(x)’ A=0, (10a) 
c(x)-s=0, (10b) 
ASe= we, (10c) 
A>0,s> 0, (10d) 


Where, as in Section 8.2, A and § are diagonal matrices with entries A, and s, , respectively, and e(1,1,...,1)". 


The setup is therefore that of primal-dual interior point methods. A modified Newton iteration for the equalities in 


(10) reads 


V>L(x,A) -AT(x) 0)\(6x\) (-Vf+ATA 

A(x) 0 -I || 0A |=| s—-c : (11) 
0 S A }\ os pe—ASe+n, 

The modification term Tf, , (which does not come out of Newton’s method-ology at all!) turns out to be crucial 

for both theory and practice. 

Upon solving (11) we update 

xe x+adxA<—A+ad1,s KH S+A0S, 

Choosing @ so that the inequalities (10d) are obeyed. 


3 Augmented Lagrangian Method 


Consider the problem with only equality constraints, (1). The basic difficulty with the quadratic penalty method has been 
that elusive limit of dividing 0 by 0. Let us therefore consider instead adding the same penalty term to the Lagrangian, 


rather than to the objective function. 


Thus, define the augmented Lagrangian 
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m 


1 
L,(x,A, W) = L(x) + — De? (x), 
2H 

m 1 m 7 
= f(x)- > Ac, (x) +-— Dic? (a). (12) 

i=l Qu ia 
The KKT conditions require, to recall, that V E(x a. } =Oand c(x ) =0, c(x.) = 0, so at the optimum the 

augmented Lagrangian coincides with the Lagrangian, and u no longer need be small. 


In fact, at some non-critical point, 


i=l 


VL, (x, A: 1) = Vf (x) S14, £9 Fa Gi), 
a 


hence we expect near a critical point that 


CAX x 
4) 4 , =1,.....,m. (13) 


A, 


7 Ll i 


We can choose some pt > 0 not very small. The minimization of the augmented Lagrangian (12) therefore yields a 


1 
stabilization, replacing Vf by Ver +—A"A. Thus, the Hessian matrix (wrto x) of the augmented 
a 


Lagrangian, ViL , is s.p.d. provided that the reduced Hessian matrix of the 
Lagrangian, Z* (V:L)Z, is s.p.d. It can further be shown that for yp small enough the minimization of (10.12) 


with respect toxat 2=A" yields x . 


Moreover, the formula (13) suggests a way to update A, in a penalty-like sequence of iterates: 


Ail — gk COR) 


rane 


In fact, we should update also 1: while updating J . This then leads to the following algorithmic framework: 


Given Uy > 0, Xo , and a final tolerance tol, 


For k =0,1,2,.... 


Starting with *, minimize VL (x, A,; ,)]|,. terminating when 


VL, (Ags || Se 
Call the result cae ; 
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If final convergence test holds (e.g. T, < tol ) exit 
Else 


Aas = %y — u,'c(x,) 


Choose 44, ,, € (0, 44, ) 


ok 
Choose a new starting point %;,1, e.g. Xpat = Xp. 


This allows a gentler decrease of u: both primal and dual variables par-ticipate in the iteration. The constraints 


satisfy 
C; Xe . 
Ci) _ gh —AN 50, 1Si<m, 
M 
Clear improvement over the expression (3) relevant for the quadratic penalty method. 
Example 3 


Let us repeat the experiments of Example | using the Augmented Lagrangian method. We use the same 


parameters and starting points, with Ay =0. 


For the quadratic 4-variable objective function we obtain convergence after a total of 29 Newton iterations. No 


damping was needed. The resulting penalty parameter sequence is 
u = 1,.1,.01,...,1.e-5, 
and the corresponding Lagrange multiplier estimates are 


A= 0,—-2.09, -12.78, —32.85, —40.59, 40.94. 


For the non-quadratic objective function we obtain con-vergence after a total of 28 iterations. The penalty 


parameter sequence is 
u = 1,.1,.01,.001,1.e-4, 
and the corresponding Lagrange multiplier estimates are 


A= 0,9.7e—7,—2.63, —3.33, —3.35. 


The smallest values of u required here are much larger than in Example 1, and no difficulty is encountered in the 
path to convergence for the augmented Lagrangian method: the advantage over the penalty method of Example | is more 


than the iteration counts alone indicate. 


It is possible to extend the augmented Lagrangian method directly for inequality constraints [26]. But instead we 


can use slack variables. Thus, for a given constraint 


c(x)20, iel, 
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we write 


c(x)—s,=0, s, 20. 


U 


For the general problem with equality con-straints plus nonnegativity constraints 


min f(x), (14a) 
c,(x) =0,1€€, (14b) 
c(x)-s, =0,ie/, (14c) 
s>0. (14d) 


For the latter problem we can utilize a mix of the augmented Lagrangian method applied for the equality 


constraints and the gradient projection method, as described in Section 9.3, applied for the nonnegativity constraints. 


This is the approach taken by the highly successful general-purpose code LANCELOT by Conn, Gould and Toint 


[11]. In the algorithmic framework presented earlier we now have the sub problem 


min 440.5,4; 0) (15) 


st. 820, 


Where J and y are held fixed when (15) is solved. 
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